Bias Correction and Confidence Intervals for Fitted Q-iteration
نویسندگان
چکیده
We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, and show that it reduces bias and improves the coverage rates of confidence intervals via simulated experiments. We also demonstrate the use of this method in the analysis of data from a randomized smoking cessation study.
منابع مشابه
CFQI: Fitted Q-Iteration with Complex Returns
Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In ...
متن کاملUncertainty quantification in unfolding elementary particle spectra at the Large Hadron Collider
This thesis studies statistical inference in the high energy physics unfolding problem, which is an ill-posed inverse problem arising in data analysis at the Large Hadron Collider (LHC) at CERN. Any measurement made at the LHC is smeared by the finite resolution of the particle detectors and the goal in unfolding is to use these smeared measurements to make nonparametric inferences about the un...
متن کاملA Bayesian approach to type-specific conic fitting
A perturbative approach is used to quantify the effect of noise in data points on fitted parameters in a general homogeneous linear model, and the results applied to the case of conic sections. There is an optimal choice of normalisation that minimises bias, and iteration with the correct reweighting significantly improves statistical reliability. By conditioning on an appropriate prior, an unb...
متن کاملMonte Carlo Comparison of Approximate Tolerance Intervals for the Poisson Distribution
The problem of finding tolerance intervals receives very much attention of researchers and are widely used in various statistical fields, including biometry, economics, reliability analysis and quality control. Tolerance interval is a random interval that covers a specified proportion of the population with a specified confidence level. In this paper, we compare approximate tolerance interva...
متن کاملOn the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Estimation∗
This paper studies the effect of bias correction on confidence interval estimators in the context of kernel-based nonparametric density estimation. We consider explicit plug-in bias correction but, in contrast to standard approaches, we allow the bias estimator to (potentially) have a first-order impact on the distributional approximation. This approach is meant to more accurately capture the f...
متن کامل